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Abstract 

We present a novel variational approach to image restoration (e.g., 
denoising, inpainting, labeling) that enables to complement established 
variational approaches with a histogram-based prior enforcing closeness 
of the solution to some given empirical measure. By minimizing a single 
objective function, the approach utilizes simultaneously two quite differ- 
ent sources of information for restoration: spatial context in terms of some 
smoothness prior and non-spatial statistics in terms of the novel prior uti- 
lizing the Wasserstein distance between probability measures. We study 
the combination of the functional lifting technique with two different re- 
laxations of the histogram prior and derive a jointly convex variational 
approach. Mathematical equivalence of both relaxations and optimality 
certificates are established. Additionally, we present an efficient algorith- 
mic scheme for the numerical treatment of the presented model. Experi- 
ments using the basic total-variation based denoising approach as a case 
study demonstrate our novel regularization approach. 

1 Introduction 

A broad range of powerful variational approaches to low-level image analysis 
tasks exist, like image denoising, image inpainting or image labeling [9], [8]. 
It is not straightforward however to incorporate directly into the restoration 
process statistical prior knowledge about the image class at hand. Particularly, 
handling global statistics as part of a single variational approach has not been 
considered so far. 

In the present paper, we introduce a class of variational approaches of the 
form 

miRi^+Fi^ + Wifi 11 ,^ ), (1) 

u 

where R(u)+F(u) is any basic variational approach consisting of a regularization 
term R(u), a data fidelity term F(u) and W(/jL u , denotes the histogram prior 
in terms of the Wasserstein distance between the histogram corresponding to 
the minimizing function u to be determined and some given histogram We 
require R(u) to be convex which holds naturally in the case of denoising. As 
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Figure 1: Denoising experiment of a noisy image (upper row, left side) taking 
into account statistical prior information through convex optimization (lower 
row, left side) infers the correct image structure and outperforms hand-tuned 
established variational restoration (lower row, right side). Enforcing global im- 
age statistics to be similar to those of the clean image (upper row, right side) 
gives our approach an advantage over methods not taking such information into 
account. 

a case study, we adopt for R(u) = TV(u), the Total Variation, see [2], and 
F(u) — J„ f(u(x), x)dx, where / can also be a nonconvex function. The basic 
ROF denoising approach of [TH] is included in this approach with f(u(x),x) — 
\\u(x) — uo(x)\\ 2 , where uq is the image to be denoised. 

Note that minimizing the first term in |TJ entails spatial regularization 
whereas the third Wasserstein term utilizes statistical information that is not 
spatially indexed in any way. Combining such different sources of information 
into a single variational approach has not been studied so far, to our knowl- 
edge. As an illustration, consider the academical example in figure [TJ Knowing 
the greyvalue distribution of the original image helps us in regularizing the 
noisy input image. We tackle the corresponding main difficulty in two differ- 
ent, mathematically plausible ways: by convex relaxations of (JTJ) in order to 
obtain a computationally tractable approach. Comparing these two relaxations 
- one may be tighter than the other one - reveals somewhat surprisingly math- 
ematical equivalence. Preliminary numerical experiments demonstrate that the 
relaxation seems to be tight enough so as to bias effectively variational restora- 
tion towards given statistical prior information. 
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Our paper is organized as follows. After briefly reviewing relevant work in 
section [2] we state our approach and the related variational problem in section 
[3j Sections |4] and [5] provide two apparently quite different relaxation approaches 
that enable to solve approximately the variational problem by convex optimiza- 
tion. Equivalence of these two relaxations is established in section [6| We con- 
clude with an efficient algorithm for computing a solution to the relaxation and 
show numerical examples in section [8] that demonstrate basic properties of our 
approach. 

2 Prior Work 

Image regularization by variational methods is a powerful and commonly used 
tool for denoising, inpainting, labeling and many other applications. As a case 
study in connection with ([I]), we consider one of the most used approaches for 
denoising, namely the Rudin, Osher and Fatemi (ROF) model from |16| : 

min \\u- u \\ 2 + a r TV{u), (2) 

where Uo is the input image, TV denotes the Total Variation and BV is the 
space of functions of bounded variation. The minimization problem ^ is convex 
and can be solved to a global optimum efficiently by various sparse first-order 
algorithms even for large problem sizes, e.g. by Primal-Dual methods, see [4], 
or other algorithms for nonsmooth convex optimization like |13j . which we will 
use in our paper. 

We can also use more general data terms instead of the quadratic term in 
For example in [TT] it is shown how the data term can be replaced by an 
(almost) arbitrary nonconvex function J Q f(u(x),x)dx. Still this data function 
is local and does not take into account global statistics. 

In the case that some prior knowledge is encoded as a histogram, the Wasser- 
stein distance and the associated Optimal Transport is a suitable choice for 
penalizing deviance from prior knowledge in a reasonable way. More generally 
the Wasserstein distance can be used as a distance on histograms over arbitrary 
metricized spaces. 

Regarding the Wasserstein distance and the theory of Optimal Transport, 
we refer to the in-depth treatise [T7] . Optimal Transport is well-known as Earth 
Mover's distance in image processing and computer vision |T5] and is used for 
content-based image retrieval. Further recent applications include [5] and |10) 
in connection with segmentation and [6] for texture models. 

Our histogram-based prior detailed below, employing the Wasserstein dis- 
tance, and its use as part of a single and general variational approach through 
convex relaxation, appears to be novel. 
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3 Problem and Mathematical Background 



We introduce the original non-convex model, consider different ways to write the 
Wasserstein distance and introduce the functional lifting technique for rewriting 
the resulting optimization problem to make it amenable for optimization. 

3.1 Problem Statement 

For an image domain fl C M 2 , e.g. fl — [0, l] 2 and u : O — > [0, 1], consider the 
following integral over Dirac measures: 

M u = pi fs v (x)dx, (3) 

where 

is the Dirac measure. ^ is the grey-value histogram of the image u, i.e. 

» u (A) = ~\u-\A)\, (5) 

where Ac [0, 1] and |£?| is the Lebesgue measure of a set BeSl. 
We would like to minimize the energy function 

min E(u) = [ f(u(x),x)dx + TV(u) + W(u u ,n°). (6) 

TV(u) is the Total Variation 

TV{u) =supl J u(x) ■ div c/)(x)dx : cj> G cl(n,R 2 ), \\</>\\l°° < l| (7) 

see [5] for more details. / : [0, 1] X Q — > R is a measurable but apart from that 
arbitrary fidelity function, and W is the Wasserstein distance 



W(ji, v) = inf / c dir. (8) 

"E-M (M> v )J [04] x [0,1] 

c : [0, 1] x [0, 1] — > M is the cost function for the Wasserstein distance, for example 
c(x, y) = \x — y\ p with p > 1. The space of transport plans is 

«/) - e P([0, 1] x [0, 1]) : 1^*1°; Z ^(B) VA ' 5 measurable}, 

(9) 

where V([0, 1] x [0,1]) is the space of all probability measures defined on the 
Borel-cr- Algebra over [0, 1] x [0, 1]. By Theorem 4.1 in [17] there exists a measure 
which minimizes ([8| and it is called the optimal transport plan. Q is a linear 
minimization problem subject to linear constraints and therefore convex. Note 
however that energy Q is not convex. 

By minimizing ^ we obtain a solution u which remains faithful to the data 
by the fidelity term /, is spatial coherent by the Total Variation term and has 
global grey- value statistics similar to /iq- 
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3.2 The Wasserstein Distance and its Dual 



We reformulate energy ^ to make it more amenable for numerical computation 
by introducing another way to obtain the Wasserstein distance. For this we 
recall Theorem 5.10 in |17j . which states that the following dual Kantorovich 
formulation equals the Wasserstein distance: 

W(fx u ,fi°) = sup f ^ u -f ip'dn . (10) 

{(V>,V) : 4:(x)--4,'(y)<c(x,y)} Jo Jo 

Define therefore 

£(u, = f f(u(x),x)dx + TV(u) + [ t[jdn u - f i/j'dp, (11) 

Jn Jo Jo 

and let 

C = BV(fi, [0, 1]) (12) 

be the space of functions of bounded variation with domain 57 and range [0, 1] 
and 

D = U ■ [0 11 -> R s t ^ {x) ~ V/(y) " C{X > V) Va; ' V € [ °' 1] \ (13) 
1 1,1 -0, -0' measurable J v ; 



It follows from ( 10 1 with the above definitions that 



min E(u) = min max E(u,tb,it>') (14) 
uec uec (w)eD 



3.3 Functional Lifting 

While the Wasserstein distance ^ is convex in both of its arguments, see The- 
orem 4.8 in |17j . (|6| is not convex due to the nonconvex transformation u i— > fi u 
in the first argument of the Wasserstein term and the possible nonconvexity of 
/. To overcome the nonconvexity of both the data term and the transformation 
in the first argument of the Wasserstein distance we lift the function u. Instead 
of u we consider a function <p below whose domain is one dimension larger. This 
extra dimension represents the range of u and allows us both to linearize the 
fidelity term and to convexity the Wasserstein distance. This technique, known 
as functional lifting or the calibration method, was introduced in pQ and is 
commonly used in many optimization problems. 
Let 

C" = {^:fix[0,l] ->{0,1} : <p(-,0) = l, 1) = 0, <9 2 < 0, <f> G BV}. (15) 
Every function u € C corresponds uniquely to a function <f> £ C via the relation 
u (hj) = y <^ -d 2 <p(x, •) = 5 y . (16) 
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Also for such a pair (u, (f>) we have the relation 

M" = M = |4 /j 5 ^' ■)!<** = W ^ -^20(x, ■)<*=. (17) 

In contrast to the transformation u h-> \i u , <fi i— > M is linear. 
Consider the energy 

E'(c/>,^') = / o 1 Ty(0(-,7))d7 + /o 1 LI 9 2^, 7)1/(7,^)^7 nsl 
+ / 1 V^-/ 1 W- 1 ^ 

For a pair (u, 0) as above the identity 

E{u^,^')=E'{^^') (19) 

holds true by the coarea formula 3.40, see [3J. Consequently, we have 

min max E(u,ib,ib) = man max E {(b.ih'ib). (20) 
uec (i/,,v>')e-D 4>ec> (i>,if,')eD 

Note that -E' is convex in ^ and concave in (ip,tp'). 

Remark 1. ^4s discussed in Section^ we merely consider total variation based 
regularization as a case study, but this restriction is not necessary. More general 
regularizers can be used as well. In the present paper however, we rather focus 
on the novel prior based on the Wasserstein distance. 

4 Relaxation as a Convex /Concave Saddle Point 
Problem 

Optimizing energy ([6| is not tractable, as it is a nonconvex problem. Also 



solving (20) is not tractable, as the set C is nonconvex. The latter can be 
overcome by considering the convex hull of C, which leads to a relaxation as a 
convex/concave saddle point problem for the minimization problem ([6]), which 
is solvable computationally. 

Proposition 1. Let 

C" = {4> : x [0, 1] -> [0, 1] : 0) = 1, </)(; 1) = 0, d 2 <t> < 0, G BV} (21) 
Then E' is convex/ concave and 

mmE(u) > min sup E'((f>, ip, ip'). (22) 

uec 06C" {^^i) &D 

1/ 

min max E(u, ip, ip') = max min E(u, ip, ip ) (23) 

uec (tI>,4>')£D {*l>,ip')&D u&C 

holds, then the above relaxation is exact. 
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Proof. Note that C" is a convex set, in particular it is the convex hull of C . E' 
is also convex in (j>, therefore the right side of (22) is a convex/concave saddle 
point problem. For fixed (tp, tp') we have the following equality: 



mm E(u, tp, tp') 



min E' (<j), tp, tp ) 

</>GC" 



which is proved in [11] . Thus 

min„ eC E(u) = 

(*) 
> 

(**) 



min M6C . max (v ,^/) SD E(u, ip, tp') 
max(^,^) eD min„ 6C E(u, ip, ip') 



max 



(ij>,4>')&D min^ e c" E'((p, ip, ip'), 



(24) 



(25) 



where (*) is always fulfilled for minimax problems and (**) is a consequence of 
(24). This proves (22). If (23) holds, then (*) above is actually an equality and 
the relaxation is exact. □ 



5 Relaxation with Hoeffding-Frechet Bounds 

A second relaxation can be constructed by using the primal formulation (JsJ) of 
the Wasserstein distance and enforcing the marginals of the distribution function 
of the transport plan to be yfi and fi° by the Hoeffding-Frechet Bounds: 

Theorem 1. Let F\,F 2 be two real distribution functions (d.f.s) and F a d.f. 
on M 2 . Then F has marginals F\ , F2 , if and only if 

{F x {xi) + F 2 {x 2 ) - 1)+ < F( Xl ,x 2 ) <mm{F 1 (x 1 ),F 2 (x 2 )} (26) 

Proof, see Theorem 3.1.1 in [T^] □ 

By (JsJ) the Wasserstein Distance with marginal d.f.s Fi,F 2 can be computed 
by solving the optimal transport problem and we arrive at the formulation 

W(dFi,dF 2 ) = min / c dF, s.t. F respects the Hoeffding-Frechet Bounds, 
F Jr2 

(27) 

where dF t shall denote the measure associated to the d.f. F i: i = 1,2. 

Using again the functional lifting technique of [TT], the Hocffding-Frechet- 



Bounds and the representation of the Wasserstein distance (27), we arrive at 
the following relaxation. 

min^ Jq 1 TV(<p(-, 7)^7 + J* J Q \d 2 cp(x, 7)! • f(j, x)dxdj + J R2 c dF, 
s.t. F<t,(x) = J Q $*\d 2 cp{x,i)\didx, 

F^o{x) = ^d^, (28) 
F${x\) + F /Jf o(x 2 ) - 1 <F(xi,x 2 ) < mmlF^xi), F fl o(x 2 )} 

<t> e c" 
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(28) is a relaxation of Just set 



1, u(i,j)<l 
0, u(i,j)>j 



and let F be the d.f. of the optimal transport measure with marginals fi u and 



Remark 2. It is interesting to know, when relaxation (28) is exact. By the 
coarea formula flty we know that 

/ o 1 Ty(0(., 7 ))d 7 + / o 1 Ll 5 2^,7)l ■ f(7,x)dxd 1 (2Q) 



= f TV(u a )da + f Q f n f(u a (x),x)dxda, 



where u a coresponds to (f> a = l{0 >ce } via relation |16j). However such a formula 
does not generally hold for the optimal transport: Let 

<t>* = 1 {<f»a} (30) 

and let F a be the d.f. of the optimal coupling with marginal d.f.s F ( j }a and FnO. 
Then 

F= I F a da (31) 



has marginal d.f.s F,p a da and F^o, but it may not be optimal. 

A simple example for inexactness can be constructed as follows: Let the data 
term bed = and let fj,Q — -^(Sq+Si) and let the cost for the Wasserstein distance 
be c(x, y) = X\x — y\. Every constant function with u(i,j) = const € [0, 1] will be 
a minimizer if A is small enough. The objective value will be ^ . But relaxation 
(28) is inexact in this situation: Choose (f>(x,~f) = \ V7 G (0, 1) and the relaxed 
objective value will be 0. 



6 Relationship between the two Relaxations 

Both relaxations from Sections [3] and [5] seem to be plausible but seemingly 
different relaxations. Their different nature reveals itself also in the conditions 
for which exactness was established. While the condition in Proposition [T] is 
dependent the gap introduced by interchanging the minimum and maximum 



operation, relaxation ( 28 ) is exact if a coarea formula holds for the optimal 
solution. It turns out, that both equations are equivalent however, hence both 
optimality conditions derived in sections|4]and[5]can be used to ensure exactness 
of a solution to either one of the relaxed minimization problems. 



Theorem 2. The optima of the two relaxations (22) and (28) are equal. 
Proof. It is a well known fact that 

minmax(Xa;,y) + G{x) - F*{y) (32) 
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and 

min F{Kx) + G(x) (33) 

are equivalent, where G : X — > [0, oo] and F* : Y — >• [0, oo] are proper, convex, 
lower-semicontinuous (l.s.c.) functions, F* is the convex conjugate of the convex 
l.s.c. function F and X and Y are two real vector spaces, see [T3] for details. 
To apply the above result choose 

G(tj>)= [ TV{(j>{; 1 ))d 1 + [ [\d 2 <P( X ,j)\-f^,x)dxd-y + Lc"((t>), (34) 
jq Jo Jn 

F*(^)=/ <//<V + ^(^') (35) 

and 



X: W(Ox [0,1])^X([0,1]) 2 , 



where C" is defined by (21 ) and I? by (13), lc( ) an d ££>(•) denote the indicator 



functions of the sets C" and D respectively and A4([0, 1]) denotes the space of 



relaxation (22) 



measures on [0, 1]. (32) corresponds with the above choices to the saddle point 



Recall that F = (F*)* , i.e. F is the Legendre-Fenchel bidual of itself, see 
|14j . Hence, for positive measures /j,, v, the following holds true: 

F(n,v) = sup^ iV ,,{ /„ ipdfi- f ip'di>- F*(i/},ip')} 

= sup Wi#)e£) {/ ipdfi - f ip'dv - J Q V'V} 

= M M ,// + M °) (37) 

(*) f W(a*,A*°) ,f = 

1 oo , otherwise 

where cr^(a;) = sup aGj4 (a, x) is the support function of the set A. To prove (*), 
we invoke Theorem 5.10 in [T7], which states that 

<jd(i~<>, v) = sup / ipd[i— ip'i>= min / c(x, y)dir(x, y) = W(ji, v), 
(i/>,il>')eD J o io Ten(/i,f) yj 0)1 ]2 

(38) 

and we have infinity for measures which do not have the same mass. 



Thus ( 33 1 can be written as 



mhxG(<j>) + a D (-^- / d 2 cj ) (x, -)dx, M °). CWi 



This problem is relaxation ( 28 ) , which concludes the proof. □ 



7 Optimization 

We present five experiments and the numerical method used to compute them. 
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7.1 Implementation 

First, we discretize the infinite dimensional set C" and denote it by C'^. Hence 
we consider only finitely many grey values, which an image can take. Second 
we discretize the image domain ft to be {1, . . . , n\] X {1, . . . , 71%) and use as the 
gradient operator forward differences. 

We use the Generalized Forward-Backward Splitting algorithm as in [T3] for 



solving the relaxation (22) 



Algorithm 1: The Generalized Forward-Backward Splitting Algorithm 
Data: Oi)ie{i,...,fc} E H, 

(^i)ie{l,...,k}' S - t -T,i=l UJ i = 1 

Result: x optimal for the energy (/, x) + Yl7=i Gi{x) 
Initialize: 

t <- 

Main iteration: 
repeat 

for i ■<— 1 to k do 

| z t <- Zi + prox_i 7Gi (2x — Zi — /) — x 

end 

t<-t + l 

until convergence; 

Here T~L — C'J x (C'J) 2 . The second component of T-L holds the gradient of 4>. 
The convex functions Gi are: 

• G !_{<!>, g) = <5{v0= ff } 

• G 2 (g) = \\g\h 

• G 3 (4)=6c44>) 

. G A (4>) = W(^,fi°) 

Gi and G2 are splitting the TV-term, as is customary, see Section 6.1.3. in [T3"] . 

The Generalized Forward-Backward Splitting algorithm requires to compute 
efficiently the proximity operators 

prox G . (x) = argmin^/ - ||x — x'\\ 2 + Gi(x'). (40) 

prox Gi is the shrinkage operator and prox G2 is the projection onto the set 
{Vcj) — g}. The latter can be efficiently computed with Fourier transforms, 
see again [13]. 

prox G is the projection onto the set of non-increasing sequences C". To 
compute this projection, we employ the algorithm proposed in [3], Appendix D. 
It is trivially parallelisable and converges in a finite number of iterations. 

Last, the proximity operator for the Wasserstein distance can be computed 
efficiently in some special cases, as discussed next. 
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7.2 Wasserstein Proximation for c(x,y) = \x — y\ by shrink- 
age 

In general, computing the proximity operator for the Wasserstein distance can 
be expensive and requires solving a quadratic program. However, for the real 
line and convex costs, we can compute the proximity operator more efficiently. 
One algorithm for the cost function c(x, y) = \x — y\ is presented below. 
The proximation for the weighted Wasserstein distance is 

argmin l -\\cjP - tjffo + XW(^,fX°). (41) 

For the special case we consider here, there is a simple expression for the Wasser- 
stein distance: 

Proposition 2. For two measures [i^v on the real line and c(x,y) — \x — y\, 
the Wasserstein distance is 



W(p,v)= \F^(x) - F^(x)\dx (42) 

JR 

Proof. Let the space of 1-Lipschitz functions on the real line be denoted by 

L = {t|i:I^I : \t/j(x) - ip(y)\ < \x - y\ Vx,y G R} (43) 



The Kantorovich dual of the Wasserstein distance ( 10 1 on the real line with 
cost function c(x,y) — \x — y\ for two histograms /i 1 and /i 2 with d.f. F^i and 
F^2 reads, by Remark 5.4. in [17], as follows: 

max WeL} f R ip dy} - f R ip d[i 2 
= max WEi ) f R ip(x)d x F /J i(x) - J R ip(x)d x F fl 2 (x) 
= maxf^j,} f R ip(x)d x (F ll i(x) ~ F^(x)) 
= max Wei} J R d x ip(x)(F fl i(x) - F /1 2(x)) dx 
= max { | p |< 1} J R p(x)(F^i(x) - F^(x))dx 
= f R \F^(x) - F^(x)\dx 



(44) 



When applying integration by parts above, no boundary terms occur because 
F„i (x) — F^2 (x) vanishes asymptotically. □ 

Due to d x cj)(i,j, x) < and cj>(x, 0) = 1, we can also write F^^) as 

F ^^ = f Q W\ Jj9 2 <p(x,r])\dxdri = J^l - cf)(x,j)dx (45) 

Next we show how to analytically solve the proximity operator for the 
Wasserstein distance in the present case. 

Proposition 3. Given <fr , A > 0, the optimal <fi for the proximity operator 

4> = argmin ±\\</, - 0° 1 1 1 + XW{F^,n°) (46) 
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is determined by 

00,7) = 0(^,7) + c 7> ( 47 ) 

c 7 = s/irmA; f— y <Po{x,l)dx - ^0(7) + 1, ^y^+j^j ^ 00 ( x > 7)^+-^° (t)- 1 

(48) 

and shrink denotes the shrinkage operator defined componentwise by 

shrink(a, A) 1 = (\a l \ - A)+ • sign^) (49) 

for a G R n , A > 0. 

Proof. By proposition [2] and the characterisation of F^ in ( 45 1 , proximation 
(41) reads 

argminj 



l^-nl + xJji-Q^Jjix^dx^-F^idr (50) 



Note that (50) is an independent optimization problem for each 7. Thus, for 



each 7 we have to solve the problem 
argmin 0( . i7) i||0 o (., 7 )-0(.,7)||2 + A|l- (±- f^( xn )dx) -i>( 7 )|. (51) 



It can be easily verified that the solution to problem (51 1 is </> (•, 7) + c 7 , where 



c 7 eR and 

c 7 g argmin ceR i|fi|c 2 + A|p r / .."(.,■.-. vl.r + < ! I (52) 

and hence 



1! 



'■- : J-luink [ ^cj) (x,-f) - F M o(7) + 1, + J^a{x^)dx^F^(j)-l. 

(53) 
□ 



Concluding, the cost for the Wasserstein proximal step is linear in the size 
of the input data. For the discretized problem a similar computation holds. 

8 Numerical Experiments 

We want to show experimentally 

1. that computational results conform to the mathematical model, 

2. that the convex relaxation is reasonable. 
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Note that we do not claim to achieve the best denoising or inpainting results 
and we do not wish to compete with other state-of-the-art methods here. We 
point out again that the Wasserstein distance can be used together with other 
variational approaches to enhance their performance, e.g. with nonlocal total 
variation based denoising, see [7]. 

In the first experiment we compare total variation denoising and total 
variation denoising with the Wasserstein term for incorporating prior knowledge. 
The data term is /(x, s) = A • ||wo(^) — s|| 2 , where uq is the noisy image in figure 
[TJ The cost for the Wasserstein distance is c(x, y) = v \x — y\, v > 0. To ensure 
a fair comparison, the parameter A for total variation regularization without 
prior knowledge was hand-tuned in all experiments to obtain best results. The 
histogram was chosen to match the noiseless image. See figure [l] for the results. 

Note the trade-off one always has to make for pure total variation denois- 
ing: If one sets the regularization parameter A high, the resulting grey-value 
histogram of the recovered image will be similar to the noisy input image and 
generally far away from the histogram of ground truth. By choosing lower data 
fidelity and higher regularization strength we may obtain a valid geometry of 
the image, however then the grey-value histogram tends to be peaked at one 
mode, as total variation penalizes scattered histograms and tries to draw the 
modes closer to each other, again letting the recovered grey-value histogram 
being different from the desired one. 

The second experiment is a more serious denoising experiment. Notice that 
again pure total variation denoising does not preserve the white and black areas 
well, but makes them grey, while the approach with the Wasserstein distance 
preserves the contrast better, see figure [5J 

In the third experiment we compare image inpainting with a total varia- 
tion regularization term without prior knowledge and with prior knowledge, see 
figure [3] for the results. The region where the data term is zero is enclosed in 
the blue rectangle. Outside the blue rectangle we employ a quadratic data term 
as in the first exeriment. Total variation inpainting without prior knowledge 
does not produce the results we expected, as the total variation term is small- 
est, when the grey color fills most of the area enclosed by the blue rectangle. 
Heuristically, this is so because the total variation term weighs the boundary 
length multiplied by the difference between the grey value intensities, and a 
medium intensity minimizes this cost. Thus the TV-term tends to avoid inter- 
faces, where high and low intensities meet, preferring smaller intensity changes, 
which can be achieved by interfaces with grey color on one side. Note that also 
the regularized image with the Wasserstein term lacks symmetry. This is also 
due to the behaviour of the TV-term described above. 

In the fourth experiment we consider inpainting again. Yevgeni Khaldei, 
the photographer of the iconic picture shown on the left of figure [4] had to 
remove the second watch. Trying to inpaint the wrist with a TV-regularizcr 
and a Wasserstein term results in the middle picture, while only using a TV- 
regularizer results in the right picture. Clearly using the Wasserstein term helps. 

In the fifth experiment we have a different setup. The original image is on 
the left of figure [5] The histogram fi° was computed from a patch of clouds, 
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Input image TV+ Vtosserstein TV only 




^■■^^■^H^^^l ^^MM^HH^^^HI ^^hhh^^^^^hh 

(a) Tiger denoising experiment with the clean image on the left, the image denoiscd with the 
Wasserstein term in the middle and the standard ROF-model on the right. 




(b) Detailed view of the tiger denoising experiment revealing that contrast is better preserved 
when the Wasserstein term is used. 

Figure 2: Tiger denoising experiment 



original image TV + Wasserstein TV 




Figure 3: Inpainting experiment with the original image and the inpainting 
area enclosed in a blue rectangle on the left, the inpainting result with the 
Wasserstein term in the middle and the result where only the TV-regularizer is 
used on the right. By enforcing the three regions to have the same size with the 
Wasserstein term, we obtain a better result than with the Total Variation term 
alone. 
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TV + Wasserstein 



Figure 4: Here we want to inapint the area occupied by the watch of the soldier, 
see the second left image. Our approach, on the second right image gives better 
results again than the approach with TV alone. 



input image 



TV + Wasserstein 



TV only 






Figure 5: Unsupervised inpainting using empirical measures as priors. Objects 
not conforming to the prior statistics are removed without labeling image re- 
gions. 



which did not include the plane. The data term is f(x,y) = Amin(|M (a;) — 
y\ 2 ,a), where a > is a threshold, so the data term does not penalize great 
deviances from the input image too strongly. The Wasserstein term penalizes 
the image of the plane whose appearance differs from the prior statistics. The 
TV-regularizer is weighted weaker than in the previous examples, because we 
do not want to smooth the clouds. 

Note that unlike in ordinary inpainting applications, we did not specify the 
location of the plane beforehand, but the algorithm figured it out on its own. The 
total variation term finally favors a smooth inpainting of the area occupied by 
the plane. In essence we have combined two different tasks: Finding out where 
the plane is and inpainting that area occupied by it. See figure [5] for results. 



9 Conclusion and Outlook 

We have presented in this paper a novel method for variational image regular- 
ization, which takes into account global statistical information in one model. 
By solving the relaxed nonconvex problem we obtain regularizd images which 
conform to some global image statistics, which sets our method apart from stan- 
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dard variational methods. Moreover the computational cost for the Wasserstein 
term we introduced is negligible, however our relaxation is not tight anymore 
as in models without the latter term. Still, the relaxation is reasonably tight. 

Our future work will consider extensions of the present approach to mul- 
tidimensional input data and related histograms, e.g. based on color, patches 
or gradient fields. The theory developed in this paper regarding the possible 
exactness of solutions does not carry over without modifications to such more 
complex settings. Moreover, it is equally important to find ways related to our 
present work to minimize such models efficiently. 
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